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Abstract. Neutrophil gelatinase-associated lipocalin (NGAL) is 
a member of the lipocalin superfamily; dysregulated expression 
of NGAL has been observed in several benign and malignant 
diseases. In the present study, differentially expressed genes, in 
comparison with those of control cells, in the mRNA expression 
profile of EC109 esophageal squamous cell carcinoma (ESCC) 
cells following NGAL overexpression were analyzed by multiple 
bioinformatic tools for a comprehensive understanding. A total 
of 29 gene ontology (GO) terms associated with immune func- 
tion, chromatin structure and gene transcription were identified 
among the differentially expressed genes (DEGs) in NGAL 
overexpressing cells. In addition to the detected GO categories, 
the results from the functional annotation chart revealed that 
the differentially expressed genes were also associated with 
101 functional annotation category terms. A total of 59 subpath- 
ways associated locally with the differentially expressed genes 
were identified by subpathway analysis, a markedly greater 
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total that detected by traditional pathway enrichment analysis 
only. Promoter analysis indicated that the potential transcrip- 
tion factors Snail, deltaEFl, Mycn, Arnt, MNBIA, PBF, E74A, 
Ubx, SPIl and GATA2 were unique to the downregulated DEG 
promoters, while bZIP910, ZNF42 and S0X9 were unique for 
the upregulated DEG promoters. In conclusion, the under- 
standing of the role of NGAL overexpression in ESCC has been 
improved through the present bioinformatic analysis. 

Introduction 

Neutrophil gelatinase-associated lipocalin (NGAL), also termed 
lipocalin2, is a member of the lipocalin superfamily, which 
includes >20 members (1). NGAL is secreted extracellularly and 
forms a heterodimer with matrix metalloproteinase-9 (MMP-9) 
through disulfide bonds protecting against degradation (2). 
NGAL tightly binds to the bacterial siderophore, possibly 
serving as a potent bacteriostatic agent by sequestering iron 
as well as regulating innate immunity and inflammation (3). 
Overexpression of NGAL has also been observed in various 
types of human cancer, including breast, colorectal, pancreatic, 
ovarian, gastric, thyroid, ovarian, bladder and kidney cancer (4). 
Previous studies have shown that NGAL is upregulated in 
esophageal squamous cell carcinoma (ESCC) and is an inde- 
pendent prognostic factor; this upregulation was significantly 
correlated with cell differentiation and tumor invasion (5,6). 

However, controversial results have been observed regarding 
the functional role of NGAL in various types of cancer cell. 
For example, NGAL was able to facilitate gastrointestinal 
mucosal regeneration by promoting cell motility and invasion 
and to reduce E-cadherin mediated cell-cell adhesion in colon 
cancer (7). NGAL was demonstrated to be highly expressed in 
human thyroid carcinomas, and NGAL knockdown inhibited 
cancer cell growth in soft agar and the formation of tumors in 
nude mice (8). Conversely, in pancreatic cancer cells, NGAL 
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reduced adhesion/invasion partly through suppressing focal 
adhesion kinase activation and Inhibited anglogenesls partly 
by blocking vascular endothelial growth factor production (9). 

In the present study, to examine the biological role of NGAL 
In ESCC, NGAL was overexpressed In the EC109 ESCC cell line. 
An mRNA mlcroarray was performed using the Agilent whole 
genome ollgo mlcroarray to Identify differentially expressed 
genes (DEGs) In NGAL overexpresslng cells compared with 
control cells (10). Multiple blolnformatlcs analyses were 
performed on these DEGs In order to gain a comprehensive 
understanding of the role of NGAL overexpresslon In ESCC. 

Materials and methods 

Differentially expressed genes. The raw data were analyzed 
using normalization and log transformation (10). Differentially 
expressed genes were Identified using a two-fold change 
threshold. 

Gene ontology (GO) enrichment and functional annotation. 
The Database for Annotation, Visualization and Integrated 
Discovery blolnformatlcs tool (DAVID; http://davld.abcc. 
nclfcrf.gov/) was applied for GO enrichment, using category 
classes Including Biological process. Cellular component and 
Molecular function. GO Is one of the most useful methods 
for functional annotation and classification of genes. In addi- 
tion, DAVID blolnformatlcs provides a functional annotation 
chart to Identify over-represented biological terms from a 
particular gene list (11). Thus far, the functional annotation 
chart provides >40 category enrichments. Including GO terms, 
sequence features, disease associations, protein functional 
domains, protein-protein Interactions, pathways, homology, 
gene functional summaries and literature. The enriched terms 
from the functional annotation chart with P<0.05 were visual- 
ized by the Enrichment Map plugln for the Cytoscape network 
visualization software (12). 

Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway 
and subpathway analysis . The bloconductor SubpathwayMlner 
package was applied to the DEG-enrlched KEGG pathways 
Identified (12). In addition to traditional entire pathway 
enrichment, SubpathwayMlner Is able to detect subpathways, 
local regions of entire pathways, which aids In gaining more 
detailed Information regarding the relevant genes In localized 
areas of a specific pathway (13). SubpathwayMlner extracts 
multiple subpathways from an entire KEGG pathway by the 
k-cllque method. The distance between any two nodes (a node 
Indicates a gene In the pathway) In a subpathway Is not larger 
than k; k was set as 4 In the present study. 

Promoter sequence patterns and potential transcription factor 
analysis. The 2,000-bp promoter sequences of the 20 genes 
exhibiting the greatest down- and upregulatlon, respectively, 
were retrieved from the UCSC genome database (http://genome. 
ucsc.edu/). The sequence patterns over-represented or 
under-represented In these two promoter sequence sets were 
analyzed by the POCO program (http;//ekhldna.blocenter. 
helslnkl.fi/poxo/poco/poco). POCO Identifies motifs that are 
over-represented In one dataset compared with a background 
set, but under-represented In another dataset compared with the 



same background set. For the parameters In the present study, 
the background organism was set as homo_saplens_clean and 
the longest pattern length was set as 8. Subsequently, significant 
sequence patterns were screened In the JASPAR transcription 
factor database (http://jaspar.blnf.ku.dk) to Identify recognized 
transcription factors (similarity Index >0.70) (14). 

Results 

GO enrichment and functional annotation. A total of 
>200 DEGs In the NGAL overexpresslng cells were 
obtained using a two-fold change as the threshold. Including 
167 upregulated genes and 96 downregulated genes (Table I). 
To determine the functional classification of the various gene 
clusters, GO annotation was conducted using DAVID, which 
constructs statistically significant functional profiles from a 
set of genes. A total of 75, 8 and 7 significantly enriched GO 
terms were Identified for these DEGs In the Biological process. 
Cellular component and Molecular function categories, respec- 
tively (P<0.05; Fig. 1). Notably, two predominant Biological 
process term groups were Identified. One group comprised 
21 Immune-associated terms. Including response to stress, 
defense response and regulation of Immune response. The other 
group consisted of 8 terms regarding chromatin structure and 
gene transcription. Including nucleosome assembly, chromatin 
assembly and proteln-DNA complex assembly. These 8 terms 
contained the same 17 DEGs: HIST1H2AC, HIST2H2AA3, 
HIST1H2BB, HIST1H2BC, HISTIHIE, HIST1H2BD, 
HISTIHIC, HIST1H2BE, HIST1H2AG, HIST1H2BF, 
HIST1H2BG, HIST1H2AD, HIST1H2BH, HIST1H2B0, 
HIST1H2BM, H2BFS, HIST1H2BK, HIST1H2BL, 
HIST1H2BL HIST2H2AC, HIST1H3D and HIST3H2BB. The 
most significant function In the Molecular function category 
was DNA-blndlng; In addition to 17 hlstone-assoclated genes, 
this contained ZMATl, IFIHl, LM02, TFCP2L1, S0X2, TP63, 
DACHl, F0XN4, TAFll and OASL. In the Cellular component 
category, a total of 23 genes associated with the extracellular 
region were Identified: SECTMl, RBP4, A2M, C3, CFB, 
PLBDl, LGALS8, SPINK5, AP0L3, CGREFl, SLC1A3, 
ISG15, SAAl, SERPINA5, AGT, GIRL, KLKIO, IGFL2, AGRN, 
SEPPl AREG, CASPl and DEFBL 

The DEGs were also clustered using the Functional anno- 
tation chart In DAVID and the enrichment was visualized by 
the Enrichment Map plugln for the Cytoscape software. In 
Fig. 2, a node signifies one functional category and node size 
corresponds to the number of enriched genes. The color depth 
corresponds to the significance (P-value) of the terms. Nodes 
from the same functional category are presented as the same 
shape. Edges between nodes were depicted when overlapping 
genes existed between these two nodes. The widths of the 
lines Indicate the number of overlapping genes between the 
functional groups, which are bigger and the wider with greater 
numbers. In the 180 total Functional annotation chart enrich- 
ments Identified, In addition to 101 terms from the three GO 
categories, 78 terms from the following annotation categories 
were Included: 14 from INTERPRO, 2 from SMART, 30 from 
SP_PIR_KEYWORDS, 24 from UP_SEQ_FEATURE, 
1 from COG_ONTOLOGY, 2 from PIR_SUPERFAMILY, 
1 from OMIM_DISEASE and 4 from KEGG_PATHWAY. 
These results provided a wider overview of the biological 



1802 



MOLECULAR MEDICINE REPORTS 10: 1800-1812, 2014 



Table I. Differentially expressed genes in neutrophil gelatinase-associated lipocalin overexpressing EC109 esophageal squamous 
cell carcinoma cells, compared with control cells. 



A, Upregulated genes 



Gene symbol 


Fold change 


Gene symbol 


Fold change 


Gene symbol 


Fold chanj 


LCN2 


75.450 


C3 


2.647 


HIST1H2BD 


2.223 


BC034319 


12.960 


ABCAl 


2.627 


HIST1H2BI 


2.217 


CGREFl 


8.069 


OASL 


2.608 


RAD9B 


2.215 


SLC1A3 


7.525 


ZMATl 


2.607 


C9orf9 


2.211 


CLGN 


5.594 


UNC5B 


2.572 


BTN3A1 


2.211 


SECTMl 


5.505 


FBPl 


2.560 


HIST1H3D 


2.210 


P0PDC3 


5.469 


HIST1H2BK 


2.560 


HERC5 


2.205 


FNDC6 


5.215 


CAMS API LI 


2.559 


TSGA2 


2.204 


FXYD3 


5.102 


PFTKl 


2.553 


BC021677 


2.204 


KLKIO 


4.260 


HISTIHIE 


2.550 


HIST1H2BG 


2.203 


CHST2 


4.142 


IFIT2 


2.544 


GPNMB 


2.200 


UBD 


4.122 


KYNU 


2.530 


PLBDl 


2.187 


LGALS8 


4.084 


RADIL 


2.495 


PAGl 


2.177 


DEFBl 


3.971 


KYNU 


2.492 


G1P2 


2.175 


PLEKHA4 


3.910 


IFITl 


2.490 


TPO 


2.172 


CSAGl 


3.817 


PPAPDC3 


2.484 


HIST1H2BN 


2.163 


LGALS8 


3.719 


HIST1H2BC 


2.480 


A2M 


2.161 


SPINK5 


3.665 


ABCAl 


2.480 


CACNAID 


2.156 


Clorf38 


3.520 


CA8 


2.468 


AGT 


2.148 


RPSAPIO 


3.464 


AREG 


2.467 


NANOSl 


2.147 


PCDHB5 


3.436 


ADD3 


2.455 


SLC5A10 


2.138 


DUSP26 


3.416 


AF264621 


2.451 


CASPl 


2.138 


DDX58 


3.376 


TDRD9 


2.445 


FAM26F 


2.126 


LM02 


3.368 


LOC375010 


2.433 


PSD3 


2.116 


CFB 


3.352 


BXl 15350 


2.433 


KARCAl 


2.112 


F0XN4 


3.336 


GRAMDIC 


2.430 


CR596233 


2.099 


BTN3A3 


3.279 


SEPPl 


2.427 


HIST1H2B0 


2.094 


BF514799 


3.245 


ALPPL2 


2.416 


PLEKHA4 


2.094 


AGRN 


3.240 


SMIMl 


2.408 


AP0L3 


2.084 


KIAA0657 


3.177 


LOG 39 1566 


2.406 


BC043357 


2.076 


GLRX 


3.074 


SAAl 


2.390 


CXorf48 


2.076 


NAALAD2 


3.067 


AKR1C3 


2.379 


HIST2H2AG 


2.076 


LOG 3 89772 


3.046 


HIST2H2AA 


2.370 


AF074986 


2.074 


DACHl 


3.044 


IFI44 


2.366 


HIST1H2AD 


2.074 


TP73L 


3.026 


REEP6 


2.361 


MGC16075 


2.068 


TCEAL2 


3.016 


PREXl 


2.357 


IGFL2 


2.064 


PADU 


2.991 


SI 00 A3 


2.355 


GIRL 


2.060 


Uior 


9 QS7 
Z.Vo / 




Z .j^o 


TA PI 1 


9 OAS 


METTL7A 


2.977 


C9orf9 


2.341 


PHEX 


2.048 


SERPINA5 


2.956 


H2BFS 


2.340 


MGG45474 


2.047 


POFIB 


2.934 


RIMS4 


2.330 


ABGG2 


2.046 


SAMD9L 


2.913 


HIST1H2BH 


2.316 


ADPRHLl 


2.043 


MGC16075 


2.908 


HIST1H2BF 


2.310 


HIST1H2AG 


2.043 


RASGEFIA 


2.904 


PSMB9 


2.307 


OSTbeta 


2.042 


RBP4 


2.895 


LOC375010 


2.290 


P0PDG2 


2.042 


IP013 


2.894 


HIST1H2BE 


2.286 


RHBDU 


2.040 
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Table I. Continued. 



A, Upregulated genes 



Gene symbol 


Fold change 


Gene symbol 


Fold change 


Gene symbol 


Fold change 


TTTY22 


2.888 


FU20035 


2.285 


GPR]26 


2.037 


HISTIHIC 


2.781 


CSTA 


2.283 


TFCP2L] 


2.036 


C15orf59 


2.777 


mST]H2BB 


2.281 


PJA] 


2.035 


HERC6 


2.774 


D1030S 


2.270 


BBSS 


2.035 




Z. /Oi 


tlltji lilZDiVi 


Z .ZOo 






CSAG3A 


2.752 


MD67 


2.260 


C]8orf56 


2.028 


1F177 




1 t\iVlJ) 1 


7 7S7 




7 017 

Z. .\J 1 / 


DOCK]] 


2.729 


HKDC] 


2.253 


MIB2 


2.014 


BLOC]S] 


2.723 


ZMAT] 


2.246 


KRAS 


2.004 




7 704. 


HTiTl H7R1 


7 7^7 

z, .Z. J / 






IFIH] 


2.696 


PI '\CR4 


2.233 






jj, L/ownreguidLeu \ 


genes 










Gene symbol 


Fold change 


Gene symbol 


Fold change 


Gene symbol 


Fold change 


CHST6 


0.0931 


FOXP] 


0.370 


EGR] 


0.455 


ZNF52] 


0.110 


FOXP] 


0.381 


ANKH 


0.456 


IFI]6 


0.155 


FSTU 


0.383 


SDC2 


0.457 


1F1]6 


0.173 


NPTX] 


0.383 


RDH]0 


0.458 


DIAPH2 


0.173 


CDK6 


0.387 


XYLT] 


0.458 


SEMA5A 


0.204 


FAM2]]B 


0.391 


TGFBI 


0.458 


CENTA2 


0.205 


GPR56 


0.394 


FRY 


0.459 


CENTA2 


0.212 


OR5]B4 


0.400 


BAPX] 


0.460 


CPM 


0.216 


SERP1NE2 


0.400 


TGFB] 


0.460 


KAU 


0.228 


ATP9B 


0.400 


HBG] 


0.462 


DCN 


0.237 


FZD]0 


0.402 


ED1L3 


0.462 


CXXC4 


0.250 


NPTX] 


0.404 


SLC38A5 


0.464 


FOXQ] 


0.252 


CR597240 


0.410 


TRPM4 


0.468 


FOS 


0.268 


NFKBIZ 


0.410 


PMP22 


0.468 


C0UA3 


0.287 


CRLF] 


0.426 


FLJ2]986 


0.470 


TMEPAl 


0.294 


GPR56 


0.426 


NR4A] 


0.470 


GUCY]A2 


0.300 


ALPL 


0.428 


CPM 


0.471 


CLU 


0.312 


SQLE 


0.429 


ADAMTS5 


0.475 


LEPREL2 


0.316 


NALP] 


0.432 


HBG] 


0.476 


LYPDC] 


0.323 


TGFBR] 


0.434 


XAGE2 


0.478 


RASGRP3 


0.323 


MGC4294 


0.434 


TSHZ2 


0.479 


TMEM46 


0.342 


COL4A4 


0.435 


INSIG] 


0.479 


0LFML2A 


0.344 


KLK] 


0.435 


RGS22 


0.484 


RGS5 


0.359 


FAM]0]B 


0.437 


KCNQ] 


0.485 


PLAT 


0.360 


SLC7A]3 


0.439 


HMGCS] 


0.490 


MASK 


0.363 


PGCP 


0.441 


MYO]A 


0.490 


TMEPAl 


0.367 


LOC]55060 


0.451 


HMGCS] 


0.490 


SCARA3 


0.368 


MCTP2 


0.452 


LOC284542 


0.494 


DISC] 


0.369 


SGK 


0.452 


PTPRB 


0.496 


SPFH2 


0.370 


FN] 


0.455 
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Figure 1. Gene ontology enrichment of the differentially expressed genes from the mRNA expression profile of neutrophil gelatinase-associated lipocalin 
overexpressing EC109 esophageal squamous cell carcinoma cells, when compared with control cells. The terms from the Biological process. Cellular compo- 
nent and Molecular function categories are signified by different patterned bar graphs. Immune-associated terms and histone-associated terms are indicated 
by triangles and squares, respectively. 



impact of NGAL overexpression in ESCC tiian traditional GO 
enrichment. Five DEGs were identified in the Homo sapiens 
(hsa)04350:TGF-beta signaling pathway term, including 
SMAD9,ACVRL1, TGFBR1,DCN and TGFBl. The autosomal 
recessive Alport syndrome is a genetic condition characterized 
by kidney disease, hearing loss and eye abnormalities. The 
majority of affected individuals experience progressive loss of 
kidney function, usually resulting in end-stage kidney disease. 
This disease was detected in the OMIM_DISEASE category 
containing two risk genes, COL4A4 and COL4A3 (15). In the 
SP_PIR_KEYWORDS category, 67 genes were enriched when 
using the Signal term. In addition, 35 genes were observed to 
be enriched using the Secreted term in SP_PIR_KEYWORDS. 
Four genes (Ci, SAAl, CFB and FNl) were enriched in the 
acute phase term in SP_PIR_KEY WORDS. The ubl conjuga- 
tion term in SP_PIR_KEYWORDS contained 27 genes; in 



addition to 15 histone-associated genes, this also included 
another 12 genes: TSHZ2, S0X2, TP63, FOS, H2BFS, INSIGl, 
COL4A3, SGKl, DDX58, PJAl, MIB2 and ADD3. The only 
enrichment term in COG_ONTOLOGY was DNA replica- 
tion, recombination and repair, which contained four genes 
(DDX58, IFITl, IFIHl and DDX60). 

Pathway and subpathway enrichment. The DEGs were 
mapped to KEGG pathways to identify the cell signaling path- 
ways influenced by the downstream effectors of NGAL. The 
DEGs were enriched in only four pathways (Table II). 

The local area of an entire pathway was able to be defined 
by multiple subpathways using the node distance k, which aids 
in understanding how the indicated genes affect the pathway 
locally. The DEGs were found to be significantly enriched in 
60 subpathways corresponding to 27 entire pathways using the 
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Table II. Enriched Kyoto Encyclopedia of Genes and Genomes DEG pathways. 



Pathway ID 


Pathway 


annMolecule Ratio" 


P-value 


05322 


Systemic lupus erythematosus 


20/268 


0.0000 


04610 


Complement and coagulation cascades 


5/268 


0.0016 


05210 


Colorectal cancer 


4/268 


0.0075 


00790 


Folate biosynthesis 


2/268 


0.0077 



■'annMolecule ratio is how many genes are enriched in a pathway. The first number indicates the number of annotated DEGs in the pathway. The 
second number signifies the total number of molecules in the pathway. DEG, differentially expressed gene. 
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Figure 2. Functional categories of the differentially expressed genes were visualized using the Enrichment map plugin of the Cytoscape network visualization 
software. A significant functional term is signified by one node with size indicating the enrichment significance P-value. Edges indicate gene overlap between 
nodes and thickness indicates the size of the overlap. Nodes from the same functional category are presented as the same shape. 
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path: 040 10 



C AC MAID 





path:04010_2 



path:04010_5 



NR4A1 



path:04010 8 

Figure 3. Differentially expressed gene-enriched subpathway in the MAPK signaling pathway (path:04010). The six differentially expressed genes are indi- 
cated by a black star in the entire origin pathway. The subpathway structures of path:04010_2, path:04010_5 and path:04010_8 generated by Subpathway 
package are also shown. MAPK, mitogen-activated protein kinase. 



Subpathway Miner package (Table III). Of note, the mitogen-acti- 
vated protein kinase (MAPK) signaling pathway (has: 04010) 
was not detected by the entire pathway enrichment, but was found 



to be significant in the subpathway analysis, with three subpath- 
ways derived from three local areas of this signaling pathway 
(Fig. 3). The subpathway path:04010_2 contained three DEGs; 
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Table III. Enriched Kyoto Encyclopedia of Genes and Genomes subpathways of differentially expressed genes in neutrophil 
gelatinase-associated lipocalin overexpressing EC109 esophageal squamous cell carcinoma cells. 



Entire pathway ID 


Entire pathway 


Subpathway ID 


P-value 


Path:04960 


Aldosterone-regulated sodium reabsorption 


path:04960_3 


0.0161 






A/iA£A '^ 

path:04960_2 


0.0462 


Fatn:U514o 


Amoebiasis 


patn:U514o_8 


0.0124 


Fatn:U4ooz 


B cell receptor signaling pathway 


patn;U4ooz_y 


0 .0002 






path:04662_4 


0 .0005 


Path:05142 


Chagas disease 


path:05142_7 


0.0483 


Fatn:U5z2U 


Chronic myeloid leukemia 


patn:U5zzU_5 


0.0015 


Fatn:U5zlU 


Colorectal cancer 


pam:U5zlU_7 


0.0077 


Fatn:U4olU 


Complement and coagulation cascades 


pam:U4olU_7 


0.0008 






patn:U4ol(J_i 


0 .0043 






patn;U4olU_o 


0 .0043 






patn.U4oiu_4 


U.Uj /j 






patn:U4DlU_z 


0.0403 






path : 046 10_3 


0.0403 






patn:U4olU_5 


A A /I O T 

U.U43z 


Fatn:U4UoU 


Cytokine-cytokine receptor interaction 


pam:U40oU_zz 


A AA 1 C 

0.0015 






. A /f A/CA A A 

patn:(J4(JoU_44 


0.0244 


Fatn:U4o23 


Cytosolic DNA-sensing pathway 


pam:U4oz3_l 


0.0364 


Fatn:U451z 


ECM-receptor interaction 


patn:U451Z_lz 


0 .0064 






patn:U451Z_zl 


0.0364 






path:04512_23 


0.0364 






path : 045 lz_z4 


A A /I 

0.0483 


Fam:(J4(JlZ 


ErbB signaling pathway 


« «> 4-1, . A /I A 1 0 A 

patn:U4Ulz_y 


0.0168 


Fatn:\JU / V(J 


Folate biosynthesis 


*,.r,+l, . AATAA 1 

patn:U(J /v(J_i 


0.0022 






*,.r.+U .AATAA /I 

pam:UU7yU_4 


0 .0022 






A AT A A C 

path:00790_5 


A A A 1 A 

0.0030 






path:00790_2 


A A A A A 

0 .0040 


Fatn:U51oU 


Hepatitis C 


patn:051oU_8 


A AT /t 

0.0364 


Fatn:(J4 / 3(J 


Long-term depression 


patn;(J4/3(J_j 


0.02/1 


Path:04010 


MAPK signalmg pathway 


path : 040 10_5 


0.0161 






„ „ .1 . A /I A 1 A 0 

patn:U4UlU_8 


A AT £. A 






path:04010_2 


A AO AO 

0.0393 


T* il_ A CI 1 O 

Path:05218 


Melanoma 


path:05218_6 


A AOT^ 

0.0322 






patn:U5zlo_3 


0.0492 


Path:(J4D21 


NOD-like receptor signaling pathway 


patn:04ozl_4 


0 .0009 






patn:04DZl_/ 


0.0364 






path:04621_6 


0.0483 


rath:U3//3 


Non-small cell lung cancer 


patn;(JjzZ3_4 


A A /I TO 


Path:U5212 


Pancreatic cancer 


patn:U5zlZ_y 


A A A AC\ 

0.0040 


Path:U52UU 


Pathways in cancer 


*,.r.+U .AC^AA '^C 

patn:U5zUU_z5 


A AA yl A 

0.0040 






4.1, AC^AA 1 0 

path:05200_18 


A AOO /I 

0 .0224 






4.1_ AC^AA T 

path:05200_3 


A AO /I 0 

0.0248 


Path:U4145 


Phagosome 


path:04145_2 


0.0483 


Path:U4D22 


RIG-I-like receptor signaling pathway 


path:04o22_l 


0 .0027 






path:04622_7 


0.0202 






path:04622_3 


0.0296 


Path:05150 


Staphylococcus aureus infection 


path:05150_l 


0.0224 






path:05150_2 


0.0224 






path:05150_7 


0.0348 






path:05150_4 


0.0483 


Path:00140 


Steroid hormone biosynthesis 


path:00140_14 


0.0483 
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Table III. Continued. 



Entire pathway ID 


Entire pathway 


Subpathway ID 


P-value 


Path:04660 


T cell receptor signaling pathway 


path;04660_6 


0 .0064 






path:04oo0_7 


0.010 7 


Path:04350 


TGF-beta signaling pathway 


path:04350_6 


0.0035 






path:04350_4 


0.0142 






path:04350_7 


0.0296 






path:04350_l 


0.0403 






path:04350_8 


0.0462 


Path:04270 


Vascular smooth muscle contraction 


path:04270_13 


0.0483 



ECM, extracellular matrix; MAPK, mitogen-activated protein kinase; NOD, nucleotide-binding oligomerization domain; RIG, retinoic 
acid-inducible gene; TGF, transforming growth factor. 



RASGRP3, KRAS and CACNAID; path:04010_5 contained 
TGFBl and TGFBRl, while path:04010_8 only contained 
NR4A1. Another pathway detected using this analysis was the 
TGF-beta signaling pathway (has:04350), which was not identi- 
fied by entire KEGG pathway enrichment, but four subpathways 
were detected. Path:04350_6 contained DCN, TGFBl and 
TGFBRl. Path:04350_4 and path:04350_7 contained DCN 
and TGFBl while path:04350_l and path:04350_8 contained 
SMAD9 and TGFBRl. 

Promoter sequence patterns and potential transcription 
factors in upregulated and downregulated genes. The spatial 
distribution and abundance of promoter cw-elements affects 
gene expression. The co-expression of upregulated and down- 
regulated genes in NGAL overpressing ECO109 cells was 
considered to be regulated by specific transcription factors at 
the transcriptional level. POCO is a software program that is 
able to identify over-represented and under-represented regula- 
tory patterns among promoter sequence sets of upregulated and 
downregulated genes. In the present study, a total of 52 signifi- 
cant sequence patterns were identified to be over-represented in 
the downregulated genes but comparatively under-represented 
in the upregulated genes, of which the top 20 patterns are 
presented in Table IV. Conversely, 75 patterns were observed 
to be over-represented in the upregulated genes and simultane- 
ously under-represented in the downregulated genes; the top 
20 patterns are shown in Table V. The identified patterns were 
5-8 bp long, containing the four known nucleotides. A, C, 
G and T, while the rest of the places in a pattern, marked as 
N, may be any of these (which are variable). Subsequently, all 
significant patterns were screened with the JASPAR transcrip- 
tion factor database to identify potential transcription factors. 
A total of 11 patterns corresponding to 14 unique transcription 
factors were detected (Fig. 4). Of these potential transcription 
factors. Snail, deltaEFl, Mycn, Arnt, MNBIA, PBF, E74A, Ubx, 
SPIl and GATA2 were unique for the downregulated DEG 
promoters, while bZIP910, ZNF42 and S0X9 were unique for 
the upregulated DEG promoters. These results indicated that 
these transcription factors may be associated with specific 
transcriptional regulation in the downregulated and upregu- 
lated DEGs. Although a number of sequence patterns did not 
correspond to known transcription factors, the possibility and 




cWlaEFl Mytn I ilellaEH / Aral MNBIA ( PBF 



ciimysiiLii 



TGiCTTCTCilCJ 



Figure 4. Sequence pattern logos of the predicted transcription factors. To 
identify recognized transcription factors, the distinguishing significant pat- 
terns were screened using the JASPAR transcription factor database. The 
potential transcription factors regulating downregulated genes are presented 
in the upper two panels, while the transcription factors regulating downregu- 
lated genes are listed in the third panel. 



importance in the regulation of DEGs subsequent to NGAL 
overexpression was not discounted. 

Discussion 

ESCC has one of the highest mortality rates of malignant 
tumors worldwide, particularly in Asia, with an overall five-year 
survival rate <20% (16). NGAL has been shown to be an 
important mediator of invasion and metastasis in ESCC (5,6,10). 
However, for a improved understanding of the role of NGAL in 
ESCC, a comprehensive analysis of the mRNA profile of NGAL 
overexpression ESCC cells was conducted in the present study, 
using multiple bioinformatic analyses. A total of 267 DEGs 
were observed in the NGAL overexpressing cells compared with 
control cells, using a two-fold change as the threshold. To under- 
stand the function of these DEGs, the DEGs were analyzed 
by GO enrichment using DAVID bioinformatics. Several GO 
terms associated with known NGAL functions were detected. 
For example, 21 immune-associated terms were identified, 
including response to stress, defense response and regulation of 
immune response. In the response to stimulus (GO:0050896) 
term, >43 genes were enriched. For example, one of the 
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Table IV. Sequence patterns over-represented in the downregulated genes, but under-represented in the upregulated genes. 



Pattern 


OCCl (#PRO/#TOT) 


OCC2 (#PRO/#TOT) 


F- score 


P-value 


TGNGGNAA 


42 (19/20) 


14(11/18) 


3803.53 


3.33E-04 


CTNNGCTT 


36(19/20) 


12(10/18) 


3370.77 


9.24E-04 


CACNNNTT 


116(20/20) 


58 (18/18) 


3160.89 


1.52E-03 


TTAANG 


107 (20/20) 


42 (13/18) 


3118.93 


1.67E-03 


CTTCNCNC 


43 (19/20) 


13 (9/18) 


3107.02 


1.72E-03 


AAGGNG 


140 (20/20) 


65 (18/18) 


3000.42 


2.21E-03 


CCNCCTT 


54 (20/20) 


19(10/18) 


2823.35 


3.36E-03 


TTAANGNA 


48 (19/20) 


14(9/18) 


2771.02 


3.80E-03 


CTNNCNTA 


71 (20/20) 


35 (15/18) 


2702.61 


4.47E-03 


AANGNGNG 


106 (20/20) 


54(17/18) 


2665.29 


4.88E-03 


GACANNT 


84 (20/20) 


40 (15/18) 


2637.85 


5.21E-03 


AANNNGNG 


372 (20/20) 


265 (18/18) 


2629.26 


5.32E-03 


GNNAAGA 


146 (20/20) 


84(17/18) 


2585.59 


5.90E-03 


CANNCNTT 


104 (20/20) 


50(16/18) 


2579.94 


5.97E-03 


TNTCCNC 


1 49 ('20/20') 


86 n8/18~) 


2575.13 


6 04E-03 


GTGGNNAG 


43 (19/20) 


15 (10/18) 


2562.63 


6.22E-03 


GAAAGNC 


35 (18/20) 


13 (10/18) 


2530.94 


6.71E-03 


CACNCNTT 


31 (19/20) 


10 (8/18) 


2452.32 


8.08E-03 


ACANNTNC 


108 (20/20) 


56(15/18) 


2447.07 


8.18E-03 


GNANNANG 


402 (20/20) 


277 (18/18) 


2380.87 


9.57E-03 



OCC, total number of patterns within the corresponding cluster sequences; OCCl, downregulated DEG promoter sequence set; OCC2, 
upregulated DEG promoter sequence set; PRO, total number of sequences with the pattern in the corresponding cluster; TOT, total number of 
sequences in the corresponding cluster; F-score, analysis of variance between the two input clusters and the background sequence set. DEG, 
differentially expressed gene. 



enriched genes, RAD9, protects against genomic instability by 
activating DNA damage checkpoint and DNA damage repair 
pathways (17). Another enriched gene, DEFBl, is constitutively 
expressed in epithelial tissues, but may be upregulated upon 
receiving inflammatory or microbial stimuli (18). 

Recent studies have observed that NGAL is involved in 
the antibacterial iron-depletion strategy of the innate immune 
system. NGAL binds catecholate-type siderophores, such as 
enterobactin synthesized by E. coli, to arrest E. coli growth 
through inhibiting the iron-uptake ability (19). Several studies 
found NGAL to be critical in the antimicrobial molecular 
response in infections, including Salmonella (20,21), 
Chlamydia (22) and Mycobacterium tuberculosis (23). The 
GO enrichment analysis in the present study suggested that in 
addition to NGAL itself, NGAL downstream effectors exert 
a marked impact on cell immune function and in response to 
other stimuli, including stress and defense responses. 

Of note, 17 histone-associated proteins were upregulated 
in response to NGAL overexpression. The association between 
NGAL and histone-associated proteins had not been reported 
previously, to the best of our knowledge. Therefore, investi- 
gating how NGAL influences chromatin structure and gene 
transcription was of interest. The results of the present study 
provided novel information regarding the role of NGAL in 
gene transcriptional regulation through chromatin organiza- 
tion and nucleosome assembly. 



The functional annotation chart provided a markedly wider 
overview of the biological impact of NGAL overexpression in 
ESCC than traditional GO enrichment. The chart reported 
that five DEGs were found using the hsa04350:TGF-beta 
signaling pathway term, which were not identified by the 
KEGG pathway enrichment analysis. Alport syndrome, which 
contained COL4A4 and COL4A3, was the only enriched term 
from the OMIM_DISEASE category listed in the chart. Urine 
and plasma NGAL have been revealed to be novel biomarkers 
for diagnosis and outcome prediction in renal dysfunction 
conditions, including acute kidney injury, chronic kidney 
disease and renal ischemia-reperfusion injury (24-26). The 
correlation between kidney disease and NGAL interaction 
with downstream effectors was marked. A total of 67 genes 
were enriched in the SP_PIR_KEYWORDS signal term and 
33 of these genes were contained in the Secreted term. 

NGAL is a secreted protein, which forms a complex with 
MMP-9 to prevent its autodegradation, which is critical for 
extracellular matrix remodeling (2). Extracellular NGAL 
has been suggested to cause the secretion of other proteins, 
such as FNl, which regulate the acute inflammatory response, 
cell-matrix adhesion and the defense response (27). Four 
genes, C5, SAAl, CFB and FNl, were enriched in the SP_ 
PIR_KEYWORDS acute phase term. Of note, all four genes 
are defined as positive acute phase proteins, which are consid- 
ered to exert the following general functions: Opsonization 
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Table V. Sequence patterns over-represented in the upregulated genes, but under-represented in the downregulated genes. 



Pattern 


OCCl (#PRO/#TOT) 


OCC2 (#PRO/#TOT) 


F-Score 


P-value 


CTCNA 


276 (20/20) 


355 (18/18) 


5070.50 


9.19E-04 


ACNNCANT 


55 (19/20) 


97(18/18) 


4985.61 


1.05E-03 


CTCA 


331 (20/20) 


476 (18/18) 


4740.31 


1.54E-03 


TNNAGTCC 


10 (10/20) 


31 (18/18) 


4712.78 


1.61E-03 


CAANCT 


56(19/20) 


109 (18/18) 


4363.70 


2.77E-03 


TNCTNAC 


60 (19/20) 


103 (18/18) 


4182.29 


3.68E-03 


TCTCA 


80 (20/20) 


124(18/18) 


4112.35 


4.10E-03 


TNNTNGAG 


66 (20/20) 


111 (18/18) 


4083.41 


4.29E-03 


GGNNTCAA 


15 (12/20) 


42 (18/18) 


3998.49 


4.90E-03 


CTCANT 


79 (19/20) 


130 (18/18) 


3985.14 


5.01E-03 


TGAGNNA 


103 (20/20) 


158(18/18) 


3861.85 


6.07E-03 


CTCAA 


66 (20/20) 


115 (18/18) 


3716.14 


7.63E-03 


ANNGGNGT 


55 (19/20) 


99(18/18) 


3684.44 


8.02E-03 


TTNGAG 


78 (20/20) 


116(18/18) 


3519.73 


1.04E-02 


TGTNANC 


64 ('18/20') 


122 ri8/18~) 


3507.74 


1 06E-02 


ANACC 


213 (20/20) 


11% (18/18) 


3458.47 


1.14E-02 


TGGNNTC 


77 (19/20) 


128(18/18) 


3384.73 


1.28E-02 


CCAANCT 


11 (8/20) 


33 (18/18) 


3379.39 


1.29E-02 


TTGANNC 


53 (19/20) 


93 (18/18) 


3372.46 


1.31E-02 


CCNANNNT 


285 (20/20) 


337 (18/18) 


3362.13 


1.33E-02 



OCC, total number of patterns within the sequences of the corresponding cluster; OCCl, downregulated DEG promoter sequence set; OCC2, 
upregulated DEG promoter sequence set; PRO, total number of sequences with the pattern in the corresponding cluster; TOT, total number of 
sequences in the corresponding cluster; F-score, result of the analysis of variance between the two input clusters and the background sequence 
set. DEG, differentially expressed gene. 



and trapping of microorganisms and associated microbial 
products; binding cellular remnants, such as nuclear fractions; 
scavenging free hemoglobin and radicals; and modulating the 
immune response of the host (28). 

Although an entire pathway may not be identified to be 
statically significant, alterations in local gene expression levels 
may affect the local pathway significantly, which results in a 
marked impact on the biological outcome. Subpathway anal- 
ysis is a powerful method to detect genes in the local area of 
the KEGG pathway. Li et al (29) constructed a drug-metabolic 
subpathway network and found the local region of the tyrosine 
metabolic pathway to be closely associated with the develop- 
ment of lung cancer. A total of 60 subpathways corresponding 
to 27 entire pathways were found in the present study. Several 
subpathway-derived entire pathways were identified using 
this method. For example, the MAPK signaling pathway and 
the TGF-beta signaling pathway were detected. These results 
suggested that although certain DEGs did not significantly 
affect an entire pathway, they did perturb the pathway locally. 
Other proteins in these pathways were not differentially 
expressed at the mRNA level, but this may exclude processes 
such as modification and complex formation, undergone by the 
DEGs. 

DEGs were classified into upregulated and downregulated 
genes as determined by the respective expression levels. 
How these two group genes were co-regulated by distin- 



guishing sequence patterns and transcription factors was 
notable. The POCO software program identifies over- and 
under-represented regulatory patterns among the promoter 
sequence sets of upregulated and downregulated genes. Not 
all DEGs are considered to be modified at the transcriptional 
level; the DEGS may have been differentially expressed due 
to differences in mRNA stability. Thus, in the present study, 
the 20 genes exhibiting the greatest up- or downregulation in 
NGAL overexpressing ESCC cells were analyzed by POCO. 
Hundreds of significant sequence patterns and dozens of tran- 
scription factors were found to be over- and under-represented 
in the downregulation gene set and the upregulation gene set, 
respectively. This suggested that the change in signal trans- 
duction following NGAL overexpression resulted in specific 
transcription factors and/or certain sequence patterns exerting 
critical regulatory roles, to achieve co-regulation of the signifi- 
cantly down- or upregulated genes at the transcriptional level. 

A number of these potential transcription factors have 
previously been reported to be associated with cancer invasion 
or metastasis. Snail and ZEBl (deltaEFl) are predominantly 
involved in the repression of E-cadherin expression, resulting 
in epithelial to mesenchymal transition, which has been 
implicated as the critical event initiating cancer invasion and 
metastasis (30,31). Overexpression of Snail was shown to corre- 
late positively with lymphovascular invasion and was associated 
with poorer overall survival in ESCC patients (32). Nuclear 
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expression of ZEBl was observed in >33% ESCC tumor cells, 
while ZEBl was not detected in the normal adult esophageal 
epithelia (33). PBF was hypothesized to induce the transloca- 
tion of PTTG to the cell nucleus, where it induces tumorigenesis 
via a number of different mechanisms (33). PBF is upregulated 
by estrogen and mediates estrogen-stimulated cell invasion in 
breast cancer cells (34). SPll co-operates with MYC regulating 
the transcription of microRNA-29b, which is important in the 
neutrophil differentiation of acute promyelocytic leukaemia 
cells (35). Notably, NGAL was first identified as a protein stored 
in specific granules of human neutrophils (36). A potential SPIl 
binding site was identified in the promoter region of the NGAL 
gene by computer analysis (37). These results indicated that 
SPll may be the key molecule in biological functions medi- 
ated by NGAL. SOX9, a high-mobility group box transcription 
factor, is required for development, differentiation and lineage 
commitment. Cytoplasmic SOX9 may serve as a valuable 
prognostic marker in invasive ductal carcinoma and metastatic 
breast cancer. The significant correlation identified between 
SOX9 and breast tumor cell proliferation implies that SOX9 
directly contributes to the poor clinical outcomes associated 
with invasive breast cancer (38). These results indicated that 
these transcription factors may be involved in the invasion or 
metastasis mediated by NGAL. Although numerous sequence 
patterns were not matched to known transcription factors, the 
specific base composition suggested that these patterns may be 
crucial in transcriptional regulation. These results indicated that 
these sequence patterns and transcription factors may respond 
to particular transcriptional regulation in downregulated and 
upregulated DEGs. 

In conclusion, in the present study, a comprehensive 
understanding of the role of NGAL in ESCC following NGAL 
overexpression was obtained by multiple bioinformatic anal- 
yses, particularly through analyzing subpathway and sequence 
patterns for co-expression, which provided more information 
than traditional methods. These analytical methods may be used 
to search for novel functional genes and pathways associated 
with the relevant genes identified from high-throughput data. 
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